168 research outputs found

    Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input

    Full text link
    Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive translation (AT) models. Previous work shows that the quality of the inputs of the decoder is important and largely impacts the model accuracy. In this paper, we propose two methods to enhance the decoder inputs so as to improve NAT models. The first one directly leverages a phrase table generated by conventional SMT approaches to translate source tokens to target tokens, which are then fed into the decoder as inputs. The second one transforms source-side word embeddings to target-side word embeddings through sentence-level alignment and word-level adversary learning, and then feeds the transformed word embeddings into the decoder as inputs. Experimental results show our method largely outperforms the NAT baseline~\citep{gu2017non} by 5.115.11 BLEU scores on WMT14 English-German task and 4.724.72 BLEU scores on WMT16 English-Romanian task.Comment: AAAI 201

    Ground state properties of a multi-component bosonic mixture: a Gutzwiller mean-field study

    Full text link
    Using the single-site Gutzwiller method, we theoretically study the ground state and the interspecies entanglement properties of interexchange symmetric multi-component (two- and three-) bosonic mixtures in an optical lattice, and the results are generalized to an nn-component (n=2,3,4,n=2,3,4,\cdots) system. We compute the mean-field phase diagram, the interspecies entanglement entropy, and the ground state spectral decomposition. Three phases namely the nn-component Superfluid state (nSF), the nn-component Mott insulator state (nMI), and the Super-counter-fluid state (SCF) are observed. Interestingly, we find that there are n1n-1 SCF lobes to separate every two neighboring nMI lobes in the phase diagram. More importantly, we derive the exact general expression of the interspecies entanglement entropy for the SCF phase. In addition, we also investigate the demixing effect of an n-component mixture and demonstrate that the mixing-demixing critical point is independent of n.Comment: 12 pages, 6 figure

    STL-SGD: Speeding Up Local SGD with Stagewise Communication Period

    Full text link
    Distributed parallel stochastic gradient descent algorithms are workhorses for large scale machine learning tasks. Among them, local stochastic gradient descent (Local SGD) has attracted significant attention due to its low communication complexity. Previous studies prove that the communication complexity of Local SGD with a fixed or an adaptive communication period is in the order of O(N32T12)O (N^{\frac{3}{2}} T^{\frac{1}{2}}) and O(N34T34)O (N^{\frac{3}{4}} T^{\frac{3}{4}}) when the data distributions on clients are identical (IID) or otherwise (Non-IID), where NN is the number of clients and TT is the number of iterations. In this paper, to accelerate the convergence by reducing the communication complexity, we propose \textit{ST}agewise \textit{L}ocal \textit{SGD} (STL-SGD), which increases the communication period gradually along with decreasing learning rate. We prove that STL-SGD can keep the same convergence rate and linear speedup as mini-batch SGD. In addition, as the benefit of increasing the communication period, when the objective is strongly convex or satisfies the Polyak-\L ojasiewicz condition, the communication complexity of STL-SGD is O(NlogT)O (N \log{T}) and O(N12T12)O (N^{\frac{1}{2}} T^{\frac{1}{2}}) for the IID case and the Non-IID case respectively, achieving significant improvements over Local SGD. Experiments on both convex and non-convex problems demonstrate the superior performance of STL-SGD.Comment: Accepted by AAAI202

    DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation

    Full text link
    While Diffusion Generative Models have achieved great success on image generation tasks, how to efficiently and effectively incorporate them into speech generation especially translation tasks remains a non-trivial problem. Specifically, due to the low information density of speech data, the transformed discrete speech unit sequence is much longer than the corresponding text transcription, posing significant challenges to existing auto-regressive models. Furthermore, it is not optimal to brutally apply discrete diffusion on the speech unit sequence while disregarding the continuous space structure, which will degrade the generation performance significantly. In this paper, we propose a novel diffusion model by applying the diffusion forward process in the \textit{continuous} speech representation space, while employing the diffusion backward process in the \textit{discrete} speech unit space. In this way, we preserve the semantic structure of the continuous speech representation space in the diffusion process and integrate the continuous and discrete diffusion models. We conduct extensive experiments on the textless direct speech-to-speech translation task, where the proposed method achieves comparable results to the computationally intensive auto-regressive baselines (500 steps on average) with significantly fewer decoding steps (50 steps).Comment: Accepted in EMNLP2023 main conferenc
    corecore